Semi-supervised clustering with inaccurate pairwise annotations

نویسندگان

چکیده

Pairwise relational information is a useful way of providing partial supervision in domains where class labels are difficult to acquire. This work presents clustering model that incorporates pairwise annotations the form must-link and cannot-link relations considers possible annotation inaccuracies (i.e., common setting when experts provide supervision). We propose generative assumes Gaussian-distributed data samples along with generated by stochastic block models. adopt maximum-likelihood approach demonstrate that, even weak inaccurate, accounting for significantly improves performance. observe also helps detect meaningful groups real-world datasets do not fit original data-distribution assumptions. Additionally, we extend integrate prior knowledge experts’ accuracy discuss circumstances which use this beneficial.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Clustering with Pairwise Constraints: A Discriminative Approach

We consider the semi-supervised clustering problem where we know (with varying degree of certainty) that some sample pairs are (or are not) in the same class. Unlike previous efforts in adapting clustering algorithms to incorporate those pairwise relations, our work is based on a discriminative model. We generalize the standard Gaussian process classifier (GPC) to express our classification pre...

متن کامل

A Semi - supervised Text Clustering Algorithm Based on Pairwise Constraints ★

In this paper, an active learning method which can effectively select pairwise constraints during clustering procedure was presented. A novel semi-supervised text clustering algorithm was proposed, which employed an effective pairwise constraints selection method. As the samples on the fuzzy boundary are far away from the cluster center in the clustering procedure, they can be easily divided in...

متن کامل

Kernel Optimization using Pairwise Constraints for Semi-Supervised Clustering

A critical problem related to kernel-based methods is the selection of an optimal kernel for the problem at hand. The kernel function in use must conform with the learning target in order to obtain meaningful results. While solutions to estimate optimal kernel functions and their parameters have been proposed in a supervised setting, the problem presents open challenges when no labeled data are...

متن کامل

Semi-supervised Clustering

Clustering is an unsupervised learning problem whose objective is to find a partition of the given data. However, a major challenge in clustering is to define an appropriate objective function in order to to find an optimal partition that is useful to the user. To facilitate data clustering, it has been suggested that the user provide some supplementary information about the data (eg. pairwise ...

متن کامل

Semi-Supervised Projected Clustering

Recent studies suggest that projected clusters with extremely low dimensionality exist in many real datasets. A number of projected clustering algorithms have been proposed in the past several years, but few can identify clusters with dimensionality lower than 10% of the total number of dimensions, which are commonly found in some real datasets such as gene expression profiles. In this paper we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Sciences

سال: 2022

ISSN: ['0020-0255', '1872-6291']

DOI: https://doi.org/10.1016/j.ins.2022.05.035